196 research outputs found

    The genetic organisation of prokaryotic two-component system signalling pathways

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Two-component systems (TCSs) are modular and diverse signalling pathways, involving a stimulus-responsive transfer of phosphoryl groups from transmitter to partner receiver domains. TCS gene and domain organisation are both potentially informative regarding biological function, interaction partnerships and molecular mechanisms. However, there is currently little understanding of the relationships between domain architecture, gene organisation and TCS pathway structure.</p> <p>Results</p> <p>Here we classify the gene and domain organisation of TCS gene loci from 1405 prokaryotic replicons (>40,000 TCS proteins). We find that 200 bp is the most appropriate distance cut-off for defining whether two TCS genes are functionally linked. More than 90% of all TCS gene loci encode just one or two transmitter and/or receiver domains, however numerous other geometries exist, often with large numbers of encoded TCS domains. Such information provides insights into the distribution of TCS domains between genes, and within genes. As expected, the organisation of TCS genes and domains is affected by phylogeny, and plasmid-encoded TCS exhibit differences in organisation from their chromosomally-encoded counterparts.</p> <p>Conclusions</p> <p>We provide here an overview of the genomic and genetic organisation of TCS domains, as a resource for further research. We also propose novel metrics that build upon TCS gene/domain organisation data and allow comparisons between genomic complements of TCSs. In particular, '<it>percentage orphaned TCS genes</it>' (or 'Dissemination') and '<it>percentage of complex loci</it>' (or 'Sophistication') appear to be useful discriminators, and to reflect mechanistic aspects of TCS organisation not captured by existing metrics.</p

    SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Illumina's second-generation sequencing platform is playing an increasingly prominent role in modern DNA and RNA sequencing efforts. However, rapid, simple, standardized and independent measures of run quality are currently lacking, as are tools to process sequences for use in downstream applications based on read-level quality data.</p> <p>Results</p> <p>We present SolexaQA, a user-friendly software package designed to generate detailed statistics and at-a-glance graphics of sequence data quality both quickly and in an automated fashion. This package contains associated software to trim sequences dynamically using the quality scores of bases within individual reads.</p> <p>Conclusion</p> <p>The SolexaQA package produces standardized outputs within minutes, thus facilitating ready comparison between flow cell lanes and machine runs, as well as providing immediate diagnostic information to guide the manipulation of sequence data for downstream analyses.</p

    NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data

    Get PDF
    Next generation sequencing (NGS) technologies provide a high-throughput means to generate large amount of sequence data. However, quality control (QC) of sequence data generated from these technologies is extremely important for meaningful downstream analysis. Further, highly efficient and fast processing tools are required to handle the large volume of datasets. Here, we have developed an application, NGS QC Toolkit, for quality check and filtering of high-quality data. This toolkit is a standalone and open source application freely available at http://www.nipgr.res.in/ngsqctoolkit.html. All the tools in the application have been implemented in Perl programming language. The toolkit is comprised of user-friendly tools for QC of sequencing data generated using Roche 454 and Illumina platforms, and additional tools to aid QC (sequence format converter and trimming tools) and analysis (statistics tools). A variety of options have been provided to facilitate the QC at user-defined parameters. The toolkit is expected to be very useful for the QC of NGS data to facilitate better downstream analysis

    interPopula: a Python API to access the HapMap Project dataset

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The HapMap project is a publicly available catalogue of common genetic variants that occur in humans, currently including several million SNPs across 1115 individuals spanning 11 different populations. This important database does not provide any programmatic access to the dataset, furthermore no standard relational database interface is provided.</p> <p>Results</p> <p>interPopula is a Python API to access the HapMap dataset. interPopula provides integration facilities with both the Python ecology of software (e.g. Biopython and matplotlib) and other relevant human population datasets (e.g. Ensembl gene annotation and UCSC Known Genes). A set of guidelines and code examples to address possible inconsistencies across heterogeneous data sources is also provided.</p> <p>Conclusions</p> <p>interPopula is a straightforward and flexible Python API that facilitates the construction of scripts and applications that require access to the HapMap dataset.</p

    Pydna: a simulation and documentation tool for DNA assembly strategies using python

    Get PDF
    Background: Recent advances in synthetic biology have provided tools to efficiently construct complex DNA molecules which are an important part of many molecular biology and biotechnology projects. The planning of such constructs has traditionally been done manually using a DNA sequence editor which becomes error-prone as scale and complexity of the construction increase. A human-readable formal description of cloning and assembly strategies, which also allows for automatic computer simulation and verification, would therefore be a valuable tool.Results: We have developed pydna, an extensible, free and open source Python library for simulating basic molecular biology DNA unit operations such as restriction digestion, ligation, PCR, primer design, Gibson assembly and homologous recombination. A cloning strategy expressed as a pydna script provides a description that is complete, unambiguous and stable. Execution of the script automatically yields the sequence of the final molecule(s) and that of any intermediate constructs. Pydna has been designed to be understandable for biologists with limited programming skills by providing interfaces that are semantically similar to the description of molecular biology unit operations found in literature.Conclusions: Pydna simplifies both the planning and sharing of cloning strategies and is especially useful for complex or combinatorial DNA molecule construction. An important difference compared to existing tools with similar goals is the use of Python instead of a specifically constructed language, providing a simulation environment that is more flexible and extensible by the user.Thanks to Dr. Aric Hagberg Los Alamos National Laboratory, U.S.A and Sergio Simoes, Universidade de Sao Paulo, Brasil for help with NetworkX and graph theory in general. Thanks to Henrik Bengtsson, Dept of Epidemiology & Biostatistics, University of California San Francisco, U.S.A. for critical reading of the manuscript. Thanks to the 2013 Bioinformatics 6605 N4 students A. Coelho, A. Faria, A. Neves D. Yelshyna and E. Costa for testing. This work was supported by the Fundacao para a Ciencia e Tecnologia (FCT) [PTDC/AAC-AMB/120940/2010, EXPL/BBB-BIO/1772/2013]; and the FEDER POFC-COMPETE [PEst-C/BIA/UI4050/2011]. FA and GR were supported by FCT fellowships [SFRH/BD/80934/2011 and SFRH/BD/42565/2007, respectively].info:eu-repo/semantics/publishedVersio

    Microbiome profiling by Illumina sequencing of combinatorial sequence-tagged PCR products

    Get PDF
    We developed a low-cost, high-throughput microbiome profiling method that uses combinatorial sequence tags attached to PCR primers that amplify the rRNA V6 region. Amplified PCR products are sequenced using an Illumina paired-end protocol to generate millions of overlapping reads. Combinatorial sequence tagging can be used to examine hundreds of samples with far fewer primers than is required when sequence tags are incorporated at only a single end. The number of reads generated permitted saturating or near-saturating analysis of samples of the vaginal microbiome. The large number of reads al- lowed an in-depth analysis of errors, and we found that PCR-induced errors composed the vast majority of non-organism derived species variants, an ob- servation that has significant implications for sequence clustering of similar high-throughput data. We show that the short reads are sufficient to assign organisms to the genus or species level in most cases. We suggest that this method will be useful for the deep sequencing of any short nucleotide region that is taxonomically informative; these include the V3, V5 regions of the bac- terial 16S rRNA genes and the eukaryotic V9 region that is gaining popularity for sampling protist diversity.Comment: 28 pages, 13 figure

    e-MIR2: a public online inventory of medical informatics resources

    Get PDF
    Background. Over the last years, the number of available informatics resources in medicine has grown exponentially. While specific inventories of such resources have already begun to be developed for Bioinformatics (BI), comparable inventories are as yet not available for Medical Informatics (MI) field, so that locating and accessing them currently remains a hard and time-consuming task. Description. We have created a repository of MI resources from the scientific literature, providing free access to its contents through a web-based service. Relevant information describing the resources is automatically extracted from manuscripts published in top-ranked MI journals. We used a pattern matching approach to detect the resources? names and their main features. Detected resources are classified according to three different criteria: functionality, resource type and domain. To facilitate these tasks, we have built three different taxonomies by following a novel approach based on folksonomies and social tagging. We adopted the terminology most frequently used by MI researchers in their publications to create the concepts and hierarchical relationships belonging to the taxonomies. The classification algorithm identifies the categories associated to resources and annotates them accordingly. The database is then populated with this data after manual curation and validation. Conclusions. We have created an online repository of MI resources to assist researchers in locating and accessing the most suitable resources to perform specific tasks. The database contained 282 resources at the time of writing. We are continuing to expand the number of available resources by taking into account further publications as well as suggestions from users and resource developers

    TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets. The sequenced reads can contain deletions or insertions due to sequencing limitations, and the primer sequence may contain ambiguous bases. Furthermore, the tag sequence may be unavailable or incorrectly reported. Because of the potential for downstream inaccuracies introduced by unwanted sequence contaminations, it is important to use reliable tools for pre-processing sequence data.</p> <p>Results</p> <p>TagCleaner is a web application developed to automatically identify and remove known or unknown tag sequences allowing insertions and deletions in the dataset. TagCleaner is designed to filter the trimmed reads for duplicates, short reads, and reads with high rates of ambiguous sequences. An additional screening for and splitting of fragment-to-fragment concatenations that gave rise to artificial concatenated sequences can increase the quality of the dataset. Users may modify the different filter parameters according to their own preferences.</p> <p>Conclusions</p> <p>TagCleaner is a publicly available web application that is able to automatically detect and efficiently remove tag sequences from metagenomic datasets. It is easily configurable and provides a user-friendly interface. The interactive web interface facilitates export functionality for subsequent data processing, and is available at <url>http://edwards.sdsu.edu/tagcleaner</url>.</p
    • 

    corecore